A general model for collaboration networks 



in 
o 
o 

(N 



43 

o 

CD 



■4— > 

CO 

a 

C 

o 

(N 
> 
m 
in 

CN 

o 
in 
o 

c3 



o 



X 
S3 



Tao Zhou, Ying-Di Jin, and Bing-Hong Wan^ 

Nonlinear Science Center and Department of Modern Physics, 
University of Science and Technology of China, Hefei Anhui, 230026, PR China 

Da-Ren He, Pei-Pei Zhang, Yue He, and Bei-Bei Su 
College of Physics Science and Technology, Yangzhou University, Yangzhou Jiangsu, 225002, PR China 

Kan Chen 

Department of Computational Science, Faculty of Science, 
National University of Singapore, Singapore 11754-3 



Zhong-Zhi Zhang 
Institute of System Engineering, Dalian University of Technology, 

(Dated: February 2, 2008) 



Dalian Liaoning, 116024 PR China 



In this paper, we propose a general model for collaboration networks. Depending on a single 
free parameter "preferential exponent", this model interpolates between networks with a scale- 
free and an exponential degree distribution. The degree distribution in the present networks can 
be roughly classified into four patterns, all of which are observed in empirical data. And this 
model exhibits small- world effect, which means the corresponding networks are of very short average 
distance and highly large clustering coefficient. More interesting, we find a peak distribution of act- 
size from empirical data which has not been emphasized before of some collaboration networks. Our 
model can produce the peak act-size distribution naturally that agrees with the empirical data well. 

PACS numbers: 89.75.Hc, 64.60.Ak, 84.35.-K, 05.40.-a, 05.50+q, 87.18.Sn 



I. INTRODUCTION 

The last few years have witnessed a tremendous activ- 
ity devoted to the characterization and understanding of 
complex networks which arise in a vast number 

of natural and artificial systems, such as Internet 0, 0, Q , 
the World Wide Webjgl, social networks of acquain- 
tance or other relations between individu als llOl IllL . 
metabolic netwo rks!!! Ill 111 . food webs | i d Il7 l lis t TTgj 
and many othersjMMlllll Hill Owing to the 
computerization of data acquisition process and the avail- 
ability of high computing powers, scientists have found 
that the networks in various fields have some common 
characteristics, which inspires them to construct a gen- 
eral model. Recently, some pioneer works have been done 
that bring us new eyes of the networks' evolution mech- 
anism. For instance, Barabasi and Albert's introduced 
a scale-free network model (B A network) (2?| , which sug- 
gests that two main ingredients of self-organization of a 
network in a scale-free structure are growth and prefer- 
ential attachment. 

So far, BA model may be the most successful model 
to fit the empirical results of complex system, but there 
are still a great number of real networks whose evolu- 
tion mechanisms cannot be explained by BA model. For 
truth, we should not ask for an all-powerful model which 
can explain the reason of a freewill real network coming 
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into being, since many different networks have distinct 
underlying growth mechanism. Therefore, it is meaning- 
ful to construct a microscopic suitable model aiming at 
a special kind of networks. 

A particular class of networks is the so-called col- 
laboration networks, which is considered to be a kind 
of social networks in the early studies. In the so- 
cial science literatures, a collaboration network is gen- 
erally defined as a network of actors connected by com- 
mon membership in group of some sort, such as clubs, 
teams or organizations. Some empirical studies rele- 
vant to collaboration networks have been done, includ- 
ing scientific collaboration networks@ US HH H3 H3- 
board of directorships [13, 13' movie actors collabora- 
tion networks[3^, social events attending networks for 
women 36J, and so on. It is worthwhile to point out 
that the extension of collaboration networks should not 
be restricted within social networks, one instance is the 
software collaboration networks |37j. and we will show 
more examples of collaboration networks irrelated to so- 
cial networks in the following text. 

Ramasco, Dorogovtsev and Pastor-Satorras have 
proposed a model for collaboration networks (RDP 
model) 38] . In RDP model, They found the power law 
behavior in degree distribution, the nontrivial clustering- 
degree correlation and nontrivial degree-degree correla- 
tion. Very recently, Li et al have established a model 
for weighted collaboration networks in which both the 
power law weight distribution and degree distribution are 
obtained [39|. 

In this paper, we propose a general model for collabora- 
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tion networks. Depending on a single free parameter, the 
preferential exponent, this model interpolates between 
networks with a scale-free and an exponential degree dis- 
tribution. The degree distributions of the present net- 
works can be roughly classified into four patterns, and 
all of them are observed in empirical data. And this 
model exhibits small- world effect, which means the cor- 
responding networks are of very short average distance 
and highly large clustering coefficient. More interesting, 
we find a peak distribution of act-size from empirical data 
which has not been emphasized before of some collabora- 
tion networks. Our model can produce the peak act-size 
distribution naturally that agrees with the empirical data 
well. 

The present paper is organized as follows. In section 2, 
we introduce a simple and general model for collaboration 
networks. In section 3, we show the small- world effect ex- 
hibited by this model. In section 4, we display the simula- 
tion results of degree distribution, and demonstrate that 
the degree distribution approximate to stretched expo- 
nential distribution |42l| with adjustable parameter c. In 
section 5, we show the simulation results and some em- 
pirical data of act-size distribution. The comparison and 
qualitative discussion are also included. Finally, in sec- 
tion 6, we draw the main conclusion of our work. 



II. THE MODEL 

Our network starts with m nodes which are fully con- 
nected. Then, at each time step, we add a new node into 
the network which will have a collaboration with some 
existing nodes. Inspired by BA model, we assume that 
the probability that an existing node x is chosen to be an 
actor in the collaboration is proportional to k a (a > 0), 
where k is the degree of x and a is the so-called "prefer- 
ential exponent" denoting the degree of preferential at- 
tachment. For a > 0, we have preferential attachment. 

All the existing nodes which are chosen to be collab- 
orated will link to the new node. That is to say if two 
chosen old nodes have never collaborated so far, there 
will be a new edge added connecting them. It is obvious 
that this model can be stretched to weighted one by us- 
ing the times of collaborations between the corresponding 
two nodes as edge weight. Since the aim of this paper 
is to introduce the characteristics of non-weighted net- 
works, the simulation and analysis relevant to weighted 
networks will not be included, which will be published 
elsewhere. 

It should be taken note to that the act-size is not fixed 
at each time step since whether a certain node is chosen 
will not affect other nodes. We suppose that a node with 
degree k will be chosen with the probability 



7T(k) = \k a /J2k C l 



(1) 



act-size such as the mean number of authors per paper, 
we will conclude that < s A + 1, since the num- 
ber of nodes chosen each time has the expecting value 
E7r(fc) = A. Thus, A is a parameter which can be used to 
control the average act-size of the whole network. There- 
fore, when we simulate an idiographic network of known 
average act-size, the parameter A is fixed. 



III. SMALL WORLD EFFECT 

In a network, the distance between two nodes is defined 
as the number of edges along the shortest path connect- 
ing them. The average distance L of the network, then, 
is defined as the mean distance between two nodes, av- 
eraged over all pairs of nodes and often considered to be 
one of the most important parameters to measure the ef- 
ficiency of communication networks. The clustering coef- 
ficient C (x) of node x is the ratio between the number of 
edges among A(x) and the total possible number, where 
A(x) denotes the set of all the neighbors of x. The clus- 
tering coefficient C of the whole network is the average 
of C(x) over all x. Empirical studies indicate that most 
real-life networks have much smaller average distance (as 
L ~ In N where N is the number of nodes in the network) 
than the completely regular networks and much greater 
clustering coefficient than those of the completely ran- 
dom networks. And these two properties, small average 
distance and large clustering coefficient, make up of the 
so-called small world effect. 

Inspired by the empirical studies on real-life net- 
works, Watts and Strogatz proposed a one-parameter 
model(WS model) that interpolates between an ordered 
finite-dimensional lattice and a random graph by ran- 
domly rewiring each edge of the regular lattice with prob- 
ability pH^- In WS model, L scales logarithmatically 
with TV, and the clustering coefficient decrease with N, 
which is in excellent agreement with the characteristics 
of real networks. The pioneering article of Watts and 
Strogatz started an avalanche of research on the proper- 
ties of small-world networks. In this section, we would 
like to demonstrate that the networks generated by the 
present rules display small- world effect. 

At first, we study the average distance of the present 
model using the approach similar to that in references |40t 
Ell ]. Using symbol d(i,j) to represent the distance be- 
tween nodes i and j, the average distance of present net- 
works with order N, denoted by L(N), is defined as: 



L(N) 



2a(N) 
N(N - 1) 



where the total distance is: 



(2) 



(3) 



in which A is a constant, and J2i is the normaliza- 
tion factor. Using < s > presenting the average value of 



l<i<j<N 



The distance between two existing nodes will not increase 
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FIG. 1: The dependence between the average distance L 
and the network size N. One can see that L increases very 
slowly as TV increases. The main plot exhibits the curve where 
L is considered as a function of \nN , which is well fitted 
by a straight line. The curve is above the fitting line when 
N < 4000 and under the fitting line when N > 5000, which 
indicates that the increasing tendency of L can be approxi- 
mated as lnJV and in fact a little slower than lniV. The inset 
shows the average distance L vs lnlniV, the error of linear- 
fitting by form lnlniV is smaller than lniV, indicating that the 
networks may be considered as ultrasmall world networks |4^|. 
All the data are obtained by 10 independent simulations with 
parameters a = 1 and A = 3. 



FIG. 2: The clustering coefficient vs network size N. The 
main plot and inset exhibit the dependence between the clus- 
tering coefficient C and network size N. One can see clearly 
that the clustering coefficient of the present networks is suffi- 
cient large even for big N. 



Consider (7) as an equation, then the increasing tendency 
of <j(N) is determined by the equation: 



da(N) 
dN 



= N- 



2a(N) 
^V~ : 



(8) 



which leads to 



with the increasing of network size N, thus we have: 

N 

<r(N + l)<<7(N)+J2 d ( l >N + l) (4) 



Assume that h existing nodes, x\, X2, ■ ■ • ,Xh, are selected 
to collaborate with the new node N+ 1, then d(i, N+l) 
is equal to the minimal distance between i and any one 
of the h nodes plus 1: 

d(i,N + l) =min{d{i,x j )\j = 1, 2, • • • , h} + 1 (5) 

In a rough version, the sum = min{d(i, Xj)} can be 

expressed approximately in terms of L(N — h + 1): 

N 

= min{d(i, Xj)} w (N - h)L(N -h + 1) (6) 

i=i 

In order to avoid the network being unconnected, we 
always set A > 1 and compel h > 1, which leads to 
(N - h)L(N -h+1) <(N - l)L(N). 
Combining those results above, we have: 

a{N + l) <a{N)+N+^p- (7) 



a(N) = N 2 lnN + H 



(9) 



where H is a constant. As <j(N) ~ N 2 L(N), we have 
L(N) ~ lniV. Which should be pay attention to, since 
(7) is an inequality in fact, the precise increasing ten- 
dency of L may be a little tardier than lniV. 

In figure 1, we report the typical simulation result on 
average distance of the present networks under param- 
eters a = 1 and A = 3, which agrees with the analytic 
result well. 

In succession, let's discuss the clustering coefficient. 
As we mentioned above, for an arbitrary node x, the 
clustering coefficient C(x) is: 



C(x) 



2E{x) 



k(x)(k(x) - 1) 



(10) 



where E(x) is the number of edges between any two nodes 
in the neighbor-set A(x) of node x, and k(x) = \A(x)\ 
denotes the degree of node x. The clustering coefficient 
C of the whole network is defined as the average of C (x) 
over all nodes. 

In figure 2, we report the simulation results on clus- 
tering coefficient of the present networks vs network size. 
From figure 2, one can find that the clustering coefficient 
of the present networks is sufficient large even for big 
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FIG. 3: The clustering coefficient vs preferential exponent a. 
The two curves can be considered as the clustering coefficient 
as a function of a with network size N = 5000 fixed, which in- 
crease monotonically with the increasing of a. The main plot 
and inset are of A = 4.0 and A = 6.0 respectively. It is clear 
that the clustering coefficient is sensitive to the preferential 
exponent but influenced little by A. 



FIG. 4: The degree distribution of scientific collaboration 
network, where p(k) denotes the probability a randomly se- 
lected node is of degree k and P(k) — J fc °° p(k)dk is the cumu- 
lation probability. The main plot is the degree distribution. 
The left-down inset shows how the quantity ln(— lnP(fc)) be- 
haviors as a function of lnfe, which can be approximately fit- 
ted by a straight line of slope 0.73±0.02, thus these data obey 
SED of c « 0.73(see Equ.A3). The right-up inset exhibits k c 
vs lnP(fc), which approximates to a line with negative slope. 



N. Therefore, our model exhibits completely different 
clustering structure from that of BA networks, in which 
the clustering coefficient is very small and decreases with 
the increasing of network size N, following approximately 

In addition, we plot the clustering coefficient as a func- 
tion of a with network size N = 5000 fixed in figure 3. 
The two curves with different A are almost the same, thus 
the clustering coefficient is influenced little by A. Both 
of the two curves increase monotonically with a, since 
a represents the degree of preferential attachment, this 
phenomenon reveals that the huger difference between at- 
traction of preponderant and puny individuals will lead 
to greater clustering behavior. 

Even when the network grows without preferential at- 
tachment^, e. a — 0), the clustering coefficient of our 
model is much greater than completely random networks, 
because of its special linking mode proposed here. For 
a > 1.5, the clustering coefficient approximate I, and 
the structure of corresponding networks is similar to a 
star in topology [45l l4r» |. The difference is that in our 
networks with very large a, the central part are not one 
node like star, but many nodes almost fully connected to 
each other. Since the structure for networks with a > 1.5 
is much different from reality, we will not discuss their 
characteristics hereinafter. 

Summing up, the present networks own both very 
large clustering coefficient and very small average dis- 
tance which agree with previous empirical studies well. 



IV. DEGREE DISTRIBUTION 

The degree distributions of real-life networks are 
various |47l l48l ]. Some of them such as acquaintance net- 
work of Mormons |49j are Guassian; some such as power- 
grid of southern California are exponential 35j and some 
such as network of World Wide Web are power- lawHEJ- 
However, the degree distributions of most real-life net- 
works do not obey these simple forms above, they may in- 
terpolate between Guassian and exponential ones such as 
the network of world airports 47] , or interpolate between 
exponential and power-law ones such as citation networks 
in high energy physics |5(i|. or in another form|5l|. 

In this section, we focus on the empirical results about 
collaboration networks. About four years ago, Newman 
investigated the statistic properties of scientific collabo- 
ration networks. He demonstrated that the degree dis- 
tribution can be well fitted by an truncated power-law 

in the form p(k) ~ fc _T e~ 1 ^ 29], or considered as a dou- 
ble power-law |28j. Csanyi and Szendroi have investigated 
the acquaintance networks from WIW project where the 
double power-law is also detected 32]. In fact, Lehmann 
et al have shown an example where the observed double 
power-law can be well fitted by a stretched exponential 
form[5(j. In Appendix A, the details about stretched ex- 
ponential distribution(SED) is shown, including the def- 
inition of SED, the basic properties of SED, the rela- 
tions between SED and exponential distribution as well 
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FIG. 5: The degree distribution of actors collaboration net- 
work. The main plot is the degree distribution that displays 
power-law only in the interval about k £ [50, 1000]. The solid 
line is of slope -2 for comparison. These data can be approx- 
imately fitted by SED of c = 0.45 ± 0.02. The inset exhibits 
k c vs lnP(fc). 



as power-law distribution, and the reason why we use 
SED in this paper. Figure 4 shows the degree distri- 
bution of scientific collaboration network proposed by 
NewmanJH |29| which can be well fitted by SED with 
c = 0.73 indicating this distribution is more approxi- 
mated to exponential form rather than power-law form. 
Another famous example is the collaboration network of 
movie actors [35) . which displays power-low only in its 
middle region. This distribution ia also consistent with 
a stretched exponential form with c = 0.45(see figure 5). 

We did some empirical work on collaboration networks 
as well as theoretical work and found that the degree dis- 
tributions of many real-life collaboration networks in var- 
ious fields approximately obey the stretched exponential 
form|52|. For example, if we consider the traveling sites 
as actors and the traveling routes that contain several 
sites as acts, then the recommended traveling routes from 
the web Walkchina and Chinavista in the year 2003 will 
form a Chinese touristry collaboration network, whose 
degree distribution is accurately consistent with SED of 

c = o.5o|HIH3- 

In succession, let's discuss the degree distribution of 
the networks generated from our model. Since the num- 
ber of both the selected nodes and new edges are unfixed 
during each time step, it is hard for us to obtain the ana- 
lytic results. For comparison, we will give analytic results 
for a special case of this model in Appendix B, and here, 
only the numerical results are shown. In figure 6, we re- 
port a typical simulation result with a = 1.0 and A = 4. 
The degree distribution is similar to that of movie actors 
collaboration network and can be well fitted by stretched 
exponential form of c = 0.34. We have also investigated 



FIG. 6: A typical simulation result of degree distribution with 
N = 5000, a — 1.0 and A = 4. The main plot is the aver- 
age of 100 independent simulations. The degree distribution 
exhibits observed power-law behavior in its middle region, 
which is similar to the case of movie actors collaboration net- 
work(see figure 5 or the right-up inset for comparison). The 
left-down inset shows fc ' 34 vs InP(k), which is approximated 
to a negative line indicating that the corresponding degree 
distribution can be well fitted by stretched exponential form 
of c = 0.34. 




FIG. 7: The parameter c of SED as a function of preferential 
exponent a. The value of c monotonously decreases from 0.98 
to 0.14 as the increasing of a. All the data are the average 
of 100 independent simulations, where N = 5000 and A = 4 
are fixed. The pattern of degree distributions for different 
a can be roughly classified into four types: exponential(H), 
arsy-varsy(V^), semi-power law(A) and power law(»). 



6 




k 



FIG. 8: The representative instances for the four patterns 
of degree distribution. Exponential(a = 0.0), the degree 
distribution obeys exponential form except its tail; arsy- 
varsy(a = 0.4), the degree distribution does not exhibit ob- 
served exponential or power law; semi-power law(a = 0.9), 
the degree distribution exhibits observed power-law behavior 
only in its middle region; power law(a = 1.5), the degree dis- 
tribution displays power law in all the region except a ridgy 
head and a fat tail. 



how the two parameters affect the degree distribution, 
fn figure 7, one can see that the parameter c of SED 
monotonously decreases from 0.98 to 0.14 with a, the 
smaller a corresponds "more exponential" network while 
the larger one corresponds "more power-law" one. As we 
mentioned above, for a > 1.5, the networks are star-like 
in which the degree of hub node(i.e. the node of maximal 
degree) will exceed half of the network size, which has 
not been observed in the real-life collaboration networks, 
and will not be discussed hereinafter wither. To have 
an intuitionistic sight into the degree distribution of the 
present networks, we roughly classify those distributions 
into four patterns. They arc Exponential, arsy-varsy, 
semi-power law and power law. In figure 8, we show 
the representative instances for the four patterns. There 
are no unambiguous borderline between two neighboring 
patterns. We also have checked that the parameter A af- 
fect the holistic property of degree distribution little; the 
larger A only makes the head larger for sufficient big TV. 

In a word, many real-life collaboration networks are of 
degree distribution lying between exponential and power- 
law ones that can be well fitted by stretched exponential 
form, and the present model can generate networks of 
degree distribution from "almost exponential" to "almost 
power-law" containing four patterns. 



V. ACT-SIZE DISTRIBUTION 

Act-size distribution is another characteristic distri- 
bution beside degree distribution for collaboration net- 
works, which is a particular distribution of collaboration 
networks. In many real-life cases, this distribution is 
single-peaked, and decays exponentially. One famous in- 
stance is the networks of corporate directors|54j in which 
the act-size distribution, defined as the number of direc- 
tors per board, is single-peaked (see figure 8 in ref. |4S|). 
We have also done some empirical works about act-size 
distribution of collaboration networks 52J. All of these 
networks, including Chinese touristry collaboration net- 
work, bus route network, scientific collaboration network, 
and so on, exhibit single-peaked act-size distribution. In 
figure 9, we show two examples, Chinese touristry collab- 
oration network and scientific collaboration network. In 
the former case, the act-size is the number of traveling 
sites per traveling route; the latter one only contains the 
2062 papers in Vol. 93 of Physical Review Letters, where 
each paper is considered as an act and the act-size is the 
number of authors. There are 98 papers having authors 
more than 20, which have not been shown in figure 9. 
Both of the two distributions are single-peaked and in an 
approximately exponential form. 

However, the act-size distribution seems not as attrac- 
tive as degree distribution, thus the observed peaked 
behavior has not been emphasized before. It is al- 
ways ignored [3^, or only considered as an extrinsical 
factor I23, having nothing to do with and not being af- 
fected by the evolutionary mechanism of networks. In our 
model, the act-size distribution is not generated based on 
a static perspective like the degree distribution of config- 
uration model [5j|, but an indiscerptible part of the dy- 
namical mechanism of network evolution. It is clear that, 
when a = 0, for sufficient large TV, the act-size distribu- 
tion is Possionian distribution, single-peaked and decay- 
ing approximately exponentially. Contrary to the case of 
degree distribution, the numerical study indicates that 
the act-size distribution is insensitive to a. 

A typical simulation result is shown in figure 10, one 
can compare this to the empirical data for scientific col- 
laboration networks(see figure 9b&9d). We set a = 0.4 
since it makes the parameter c of the two networks pretty 
much the same thing. Clearly, the act-size distribution 
generated by our model is well consistent to the real-life 
one qualitatively. 



VI. CONCLUSION 

In summary, we have constructed a general model for 
collaboration networks, the basic constituents of which 
are preferential attachment and particular selecting and 
linking rules aiming at collaboration networks. The 
present networks are both of very large clustering co- 
efficient and very small average distance, which is con- 
sistent with the previous empirical results that collab- 
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FIG. 9: Empirical results about act-size distribution. Figure 
a&b show the act-size distribution of Chinese touristry col- 
laboration network and the scientific collaboration network 
of Physical Review Letters, respectively. Both the two dis- 
tributions display obviously single-peaked behavior. Figure 
c&d are the corresponding cumulation distributions for those 
two networks. The red solid curves are the fitting curves of 
exponential form. In these four plots, the symbol s, p(s) 
and P(s) = p(s)ds denote act-size, the probability that a 
randomly selected act are of size s, and the cumulation prob- 
ability, respectively. 
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FIG. 10: A typical simulation result on act-size distribution 
with N = 5000, a = 0.4 and A = 2.0. The data are the aver- 
age of 100 independent simulations. The main plot exhibits 
obviously single-peaked behavior. The inset shows the corre- 
sponding cumulation distribution, which can be well fitted by 
an exponential function (see the red solid curve). 



oration networks display small- world effect. We argue 
that, the degree distribution of many real-life collabora- 
tion networks may appropriately be fitted by stretched 
form. Numerical study indicates the degree distribution 
of the present networks can be well fitted by stretched 
form with the parameter c decreaing from 0.98 to 0.14 as 
the increasing of a. We roughly classify the degree distri- 
bution of our model into four patterns, Exponential(bus 
route network in Beiji ng |52 | ). arsy-varsy (scientific col- 
laboration network 28, 29]), semi-power law(movie ac- 
tors collaboration netw ork |35j and Chinese touristry col- 
laboration network |52l_|53j) and power law(bus route 
network in Yangzhou|52j), all of which are observed in 
the empirical data. More absorbing, we find the act-size 
distribution is single-peaked and decaying exponentially, 
which can be reproduced by our model naturally. 

Although this model is too simple and rough, it offers 
a good starting point to explain the existing empirical 
data and can be easily extended when more factors that 
may affect network evolution are considered. In addi- 
tion, it is obvious that this model can be stretched to 
weighted network model if the edge weight is used to 
represent the times of collaborations between the corre- 
sponding two nodes. The further statistical properties of 
the present networks, such as the degree-degree correla- 
tion, the clustering-degree correlation and so on have also 
been investigated, which will be published elsewhere. 
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APPENDIX A: POWER LAW AND STRETCHED 
EXPONENTIAL DISTRIBUTION 

Frequency or probability distribution functions (PDF) 
that decay as a power law have acquired a special status 
in the last decade. A power law distribution p(x) charac- 
terizes the absence of a characteristic size: in dependently 
of the value of events x. In contrast, an exponential for 
instance or any other functional dependence does not en- 
joy this self-similarity. In words, a power law PDF is 
such that there is the same proportion of smaller and 
larger events, whatever the size one is looking at within 
the power law range. Since the power law distribution 
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has repeatedly been claimed to describe many natural 
phenomena and been proposed to apply to a vast set of 
social an economic statistics jH^, |H3 lHsl IHfj . Ifiol Ifilj . it 
is considered as one of the most striking signatures of 
complex dependence. Empirically, a power law PDF is 
represented by a linear dependence in a in a double log- 
arithmic axis plot of the frequency or cumulative num- 
ber as a function of size. However, logarithms are no- 
torious for contraction data and the qualification of a 
power law is not as straight-forwards as often believed. 
In addition, log-log plots of data from natural phenom- 
ena in nature and economy often exhibit a limit linear 
regime followed by a signature curvature. Latherrere and 
Sornette0| explore and test the hypothesis that the cur- 
vature observed in log-log plots of distribution of several 
data sets taken from natural and economic phenomena 
might result from a deeper departure from the power law 
paradigm and might call for an alternative description 
over the whole range of the distribution. Thus, they pro- 
pose a stretched exponential distribution(SED): 

p(x)dx = c(x c /xQ)exp[— (x/xo) c ]dx (Al) 

such that the cumulation distribution is 



P(x) = exp[-(x/x ) c ] 



(A2) 



Stretched exponentials are characterized by an expo- 
nent c smaller that one. The borderline c = 1 cor- 
responds to the usual exponential distribution. For c 
smaller than one, the distribution presents a clear cur- 
vature in a log- log plot. Based on the reasons discussed 
above, we use the SED to fit the degree distribution of 
our model. In the simulations, using the frequency p(k) 
of degree k as x-axis and k c as j/-axis, we will obtain a line 
with negative slope, if the degree satisfy strict Stretched 
Exponential Distribution. 

In numerical case, write down the equivalent form of 
Equ.(A2): 



ln(-lnP(fc)) = clnfc - clnfc 



(A3) 



Using lnfc as x-axis and ln(— lnP(fc)) as y-axis, if the 
corresponding curve can be well fitted by a straight line, 
then the slope will be the value of c. 



and linked to all the nodes of a randomly selected in- 
complete network. Under these rules, not only the act- 
size, but also the number of new edges in each time step 
is fixed, which makes the model very easy to be analyzed. 

Since after a new node is added to the network, the 
number of K m increases by to, thus when the network is 
of order N, the number of K rn is N rn — Nra — m + to,. 
Note that, when a given node's degree increases one, the 
number of K m containing this node increases m— 1, hence 
for any node with degree k, it belongs to (j>k = km — fe- 
rn 2 m-complete networks. Let n(N, k) be the number of 
nodes with degree k when N nodes are present, now we 
add a new node to the network, n(N, k) evolves according 
to the following rate equation[63|: 

n(N + l,k+l) = n(N, fe) A. + n(N, k + 1)(1 - %tl) 



N„ 



N„ 



(HI) 

When N is large enough, n(N, k) can be approximated 
to Np(k), where p{k) is the probability density function 
for the degree distribution. In terms of p(k), the above 
equation can be rewritten as: 

N 

p(k+l) = —\p(k)<f >k -p(k + l)<f >k+1 } (B2) 

^ m 

Using the expression p(k + l) —p(k) = 4j, we can get the 
continuous form of Eq.(B2): 

p(k + l) + —[(fern - k- m 2 )— + (to - l)p(k + 1)1 = 
N m dk 

(B3) 

Under the case N > k > to, this equation leads to 
p(k) oc fc-T with 7 = 222=^ <= (2,3]. The simulation 
result accurately agrees with the analytic one for large 
network size N. 

In addition, there exists a bijection from node's degree 
to clustering coefficient as: 



C(k) = 



[m — l)(2fc — to) 
k(k - 1) 



(B4) 



The clustering coefficient of the whole network can be 
obtained as the mean value of C(k) with respect to the 
degree distribution p(k): 



C = 



C(k)p(k)dk, 



(B5) 



APPENDIX B: A SPECIAL MODEL FOR 
COLLABORATION NETWORKS OF FIXED 
ACT-SIZE 

Under a very special case, the act-size is fixed. For ex- 
ample, if the four players in a bridge game are considered 
as actors in one act, then the act-size is 4. For compari- 
son, in this appendix, we introduce a resovable model for 
this special case. 

This model starts with a m-complete network 62J , 
where to > 2. At each time step, a new node is added 



where fc m i n = to is the minimal degree and fc max 3> fcmi n 
is the maximal degree. Combine Eq.(B4) and Eq.(B5), 
note that p(k) = Ak m - 1 and J^"" Ap{k)dk = 1, we can 
get the analytical result of C by approximately treat- 
ing fc max as +oo. For example, when to = 2,3,4,5 the 
clustering coefficients are 0.739, 0.813, 0.851 and 0.875. 
Further more, many real-life networks are characterized 
by the existence of hierarchical structure |64j. which can 
usually be detected by the negative correlation between 
the clustering coefficient and the degree. The BA net- 
work, which does not possess hierarchical structure, is 
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known to have the clustering coefficient C(x) of node x 
independent of its degree k(x), while the present network 



has been shown to have C(k) ~ k \ in accord with the 
observations of many real networks |64j. 
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